National Repository of Grey Literature 16 records found  1 - 10next  jump to record: Search took 0.01 seconds. 
Voice Dialog System in Web Browser for Demonstration Purposes
Vlček, Pavol ; Glembek, Ondřej (referee) ; Schwarz, Petr (advisor)
Cieľom práce je navrhnúť a vytvoriť hlasom ovládaného asistenta(voicebota), ktorý bude ľahko nasaditeľný na webovú stránku. Používateľom tak bude poskytnutý moderný spôsob, ako prirodzene komunikovať cez internetový prehliadač. Hlavný dôraz je kladený na synchronizáciu medzi hlasovým asistentom a obsahom na webovej stránke. Synchronizácia je dosiahnutá obojsmerným prenosom hlasu a textových príkazov medzi klientom a serverom. Na to je použitá technológia WebRTC v kombinácií so signalizačným protokolom SIP. Práca sa zaoberá oblasťami ako VoIP telefonovanie, počítačové siete a strojové učenie(proprietárne rečové technológie od Phonexie). Benefitom nasadenia hlasového asistenta je zníženie nákladov na odchádzajúce hovory pre klientov, odľahčenie agentov na call centrách pri odpovedaní na často kladené otázky a zvýšenie záujmu zákazníkov vďaka použitiu nových technológií.
New Techniques in Neural Networks Training - Connectionist Temporal Classification
Gajdár, Matúš ; Švec, Ján (referee) ; Karafiát, Martin (advisor)
This bachelor’s thesis deals with neural network and their use in speech recognition. Firstly,there is some theory about speech recognition, afterwards we show theory around neural networks in connection with connectionist temporal classification method. In next chapter we introduce toolkits, which were used for training of neural networks and also experiments done by them to find out impact of connectionist temporal classification method on precisionin phoneme decoding. The last chapter include summarization of work and overall evaluation of experiments.
Automatic speech recordings segmentation tool
Santa, Roman ; Zvončák, Vojtěch (referee) ; Kováč, Daniel (advisor)
Nástroj pre automatickú segmentáciu spracováva nahrávky reči a extrahuje hovorené slovo z nahrávok. Je dôležité, aby pokročilá analýza pracovala iba s rečovými časťami z nahrávky. Nástroj na segmentáciu má ulahčiť spracovanie nahrávok pre analýzu rozdielov medzi hláskami pacientov s parkinsonovou chorobou a tými zdravými. Cieľ tejto práce je navrhnúť a otestovať detektory reči s Google WebRTC detektorom a vybrať ten najvhodnejší detektor reči s minimálnym počtom chýb. Ďalej, vytvoriť nástroj na segmentáciu nahrávok a otestovať rozpoznávanie reči pomocou dynamic time warping. Bola použitá databáza poskytnutá laboratóriom pre analýzu mozgových ochorení. Obsahuje české a maďarské nahrávky s rovnakým počtom mužských a ženských pacientov a aj rovnakým počtom zdravých pacientov a pacientov s parkinsonovou chorobou. Najlepšie výsledky v testoch dosiahol detektor na základe energie reči. Nebol zistený žiaden rozdiel v presnosti detektoru pri spracovaní mužských a ženských nahrávok alebo nahrávok zdravých či chorých pacientov. Nahrávky s nízkym odstupom signálu od šumu boli náročnejšie na spracovanie s frekvenciou chýb od 12%. Na základe výsledkov, bol navrhnutý nový detektor pre spracovanie úplnej nahrávky. Na záver bol testovaný algoritmus pre rozpoznávanie podobnosti reči na základe melovských kepstrálnych koeficientov.
Measurment of Impact of Environment Acoustics on Speech Recognition Accuracy
Paliesek, Jakub ; Žmolíková, Kateřina (referee) ; Szőke, Igor (advisor)
This bachelor thesis deals with investigation of impacts of acoustical parameters on automatic speech recognition (ASR) accuracy. Used ASRs were evaluated on Speecon, Temic and LibriSpeech corpuses. This work includes comparison of different versions of these data, which were created using retransmission in several rooms and artificial retransmission using impulse responses. These were created using methods Exponential sine sweep (ESS) and Maximum length sequence (MLS) for real rooms, as well as using Image source model (ISM) method, which generates artificial impulse responses. Output of the thesis is comparison of these types of retransmission. For ESS method, ASR accuracy for different lengths of the excitation signal is examined. Furthermore, the impact of relative position between source and receiver, presence of barriers and directionality of microphones is studied.
Multi-Modal Text Recognition
Kabáč, Michal ; Herout, Adam (referee) ; Kišš, Martin (advisor)
The aim of this thesis is to describe and create a method for correcting text recognizer outputs using speech recognition. The thesis presents an overview of current methods for text and speech recognition using neural networks. It also presents a few existing methods of connecting the outputs of two modalities. Within the thesis, several approaches for the correction of recognizers, which are based on algorithms or neural networks, are designed and implemented. An algorithm based on the principle of searching the outputs of recognizers using levenshtain alignment was proven to be the best approach. It scans the outputs, if the uncertainty of the text recognizer character is less than the pre-selected limit. As part of the work, an annotation server was created for the text transcripts, which was used to collect recordings for the evaluation of experiments.
Fixed-Point Implementation Speech Recognizer
Král, Tomáš ; Černocký, Jan (referee) ; Burget, Lukáš (advisor)
Master thesis is related to the problematics of automatic speech recognition on systems with restricted hardware resources - embedded systems. The object of this work was to design and implement speech recognition system on embedded systems, that do not contain floating-point processing units. First objective was to choose proper hardware architecture. Based on the knowledge of available HW resources, the recognition system design was made. During the system development, optimalization was made on constituent elements so they could be mounted on chosen HW. The result of the the project is successful recognition of Czech numerals on embedded system.
Implementation of Simple Speech Recognizer in Android
Čuba, Eduard ; Glembek, Ondřej (referee) ; Szőke, Igor (advisor)
The goal of this project is to implement speech recognition software for Android platform. This paper outlines fundamental components of a speech recognizer and reviews the techniques used to optimize the process of speech recognition on Android devices. Firstly, it examines the implementation of the acoustic feature extraction and phoneme estimation processes. Then, it describes the design and implementation of a decoder used to process phoneme estimations into transcription, utilizing only limited resources of a mobile device. The project is divided into several modules, forming an Android library, which should be easy to extend and can be provided with custom models tailored for the desired use. Later, this paper discloses various approaches to modeling abstract data structures for recognition network representation, as well, as the ways of further development and applications of this project.
Impact of Environment Acoustics on Speech Recognition Accuracy
Paliesek, Jakub ; Karafiát, Martin (referee) ; Szőke, Igor (advisor)
This diploma thesis deals with impact of room acoustics on automatic speech recognition (ASR) accuracy. Experiments were evaluated on speech corpus LibriSpeech and database of impulse responses and noise called ReverbDB. Used ASRs were based on Mini LibriSpeech recipe for Kaldi. First it was examined how well can ASR learn to transcribe in selected environments by using the same acoustic conditions during training and testing. Next, experiments were carried out with modifications of ASR architecture in order to achieve better robustness against new conditions by using methods for adapation to room acoustics - r-vectors and i-vectors. It was shown that recently proposed method of r-vectors is beneficial even when using real impulse responses for data augmentation.
Multi-Modal Text Recognition
Kabáč, Michal ; Herout, Adam (referee) ; Kišš, Martin (advisor)
The aim of this thesis is to describe and create a method for correcting text recognizer outputs using speech recognition. The thesis presents an overview of current methods for text and speech recognition using neural networks. It also presents a few existing methods of connecting the outputs of two modalities. Within the thesis, several approaches for the correction of recognizers, which are based on algorithms or neural networks, are designed and implemented. An algorithm based on the principle of searching the outputs of recognizers using levenshtain alignment was proven to be the best approach. It scans the outputs, if the uncertainty of the text recognizer character is less than the pre-selected limit. As part of the work, an annotation server was created for the text transcripts, which was used to collect recordings for the evaluation of experiments.
Voice Dialog System in Web Browser for Demonstration Purposes
Vlček, Pavol ; Glembek, Ondřej (referee) ; Schwarz, Petr (advisor)
Cieľom práce je navrhnúť a vytvoriť hlasom ovládaného asistenta(voicebota), ktorý bude ľahko nasaditeľný na webovú stránku. Používateľom tak bude poskytnutý moderný spôsob, ako prirodzene komunikovať cez internetový prehliadač. Hlavný dôraz je kladený na synchronizáciu medzi hlasovým asistentom a obsahom na webovej stránke. Synchronizácia je dosiahnutá obojsmerným prenosom hlasu a textových príkazov medzi klientom a serverom. Na to je použitá technológia WebRTC v kombinácií so signalizačným protokolom SIP. Práca sa zaoberá oblasťami ako VoIP telefonovanie, počítačové siete a strojové učenie(proprietárne rečové technológie od Phonexie). Benefitom nasadenia hlasového asistenta je zníženie nákladov na odchádzajúce hovory pre klientov, odľahčenie agentov na call centrách pri odpovedaní na často kladené otázky a zvýšenie záujmu zákazníkov vďaka použitiu nových technológií.

National Repository of Grey Literature : 16 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.